Semi-supervised Learning of Naive Bayes Classifier with feature constraints

نویسنده

  • Nagesh Bhat
چکیده

Semi-supervised learning methods address the problem of building classifiers when labeled data is scarce. Text classification is often augmented by rich set of labeled features representing a particular class. As tuple level labling is resource consuming, semi-supervised and weakly supervised learning methods are explored recently. Compared to labeling data instances (documents), feature labeling takes much less effort and time. Posterior regularization (PR) is a framework recently proposed for incorporating bias in the form prior knowledge into posterior for the label. Our work focuses on incorporating labeled features into a naive bayes classifier in a semi-supervised setting using PR. Generative learning approaches utilize the unlabeled data more effectively compared to discriminative approaches in a semi-supervised setup. In the current study we formulate a classification method which uses the labeled features as constraints for the posterior in a semi-supervised generative learning setting. Our empirical study shows that performance gains are significant compared to an approach solely based on Generelized Expectation(GE) or limited amount of labeled data alone. We also show an application of our framework in a transfer learning setup for text classification. As we allow labeled data as well as labeled features to be used, our setup allows the presence of limited amount of labeled data on the target side of transfer learning where feature constraints are used for transferring knowledge from source domain to target domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Word Sense Disambiguation Using Semi-Supervised Naive Bayes with Ontological Constraints

Background. Word sense disambiguation (WSD) is the task of mapping an ambiguous word to its correct sense given its context. As high-quality sensetagged data is scarce and expensive to obtain, attention has shifted from fullysupervised to semi-supervised and knowledge-based approaches to WSD that rely on a lexical knowledge base such as WordNet instead of large amounts of hand-labeled data. Wha...

متن کامل

Scaling Semi-supervised Naive Bayes with Feature Marginals

Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available. However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data, meaning the techniques are not ap...

متن کامل

Semi-supervised Learning Based Aesthetic Classifier for Short Animations Embedded in Web Pages

We propose a semi-supervised learning based computational model for aesthetic classification of short animation videos, which are nowadays part of many web pages. The proposed model is expected to be useful in developing an overall aesthetic model of web pages, leading to better evaluation of web page usability. We identified two feature sets describing aesthetics of an animated video. Based on...

متن کامل

Large Scale Text Classification using Semisupervised Multinomial Naive Bayes

Numerous semi-supervised learning methods have been proposed to augment Multinomial Naive Bayes (MNB) using unlabeled documents, but their use in practice is often limited due to implementation difficulty, inconsistent prediction performance, or high computational cost. In this paper, we propose a new, very simple semi-supervised extension of MNB, called Semi-supervised Frequency Estimate (SFE)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013